june 23, 2020

opening

house-keeping

  • unmute your mic to say hello (then mute mic)
  • 4hrs with breaks
  • there will be exercises
  • basics assumed but do ask
  • R and RStudio needs to be installed: install R and RStudio
  • tidyverse (loads ggplot2)

workshop folder

  • go to github: https://github.com/jensroes/visualisation-workshop
  • download repository: visualisation-workshop
  • unpack folder (if downloaded as zip)
  • contents:
    • data
    • exercises
    • scripts
    • slides
    • visualisation-workshop.Rproj
  • double-click on visualisation-workshop.Rproj

outline

  • principles of data visualisation
  • grammar of graphics
  • aesthetics and attributes
  • geometries
  • major tools of data visualisation
  • cosmetics
  • closing remarks
  • references

what is data visualisation?

  • graphical representation of data
  • graphical data analysis (stats): what do we want to know?
  • communication and perception (design): what do we want to communicate?
  • exploratory plots: confirm and analyse data (small specialist audience)
  • explanatory plots: inform and persuade (wide audience)
  • advice: think about the audience

exploring data

horses <- read_csv("../data/horses.csv")
glimpse(horses)
Rows: 50
Columns: 6
$ X1      <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
$ HorseID <dbl> 97, 156, 56, 139, 65, 184, 88, 182, 101, 135, 35, 39, 198, 10…
$ Price   <dbl> 38000, 40000, 10000, 12000, 25000, 35000, 35000, 12000, 22000…
$ Age     <dbl> 3, 5, 1, 8, 4, 8, 5, 17, 4, 6, 7, 7, 14, 6, 3, 6, 6, 12, 7, 7…
$ Height  <dbl> 16.75, 17.00, NA, 16.00, 16.25, 16.25, 16.50, 16.75, 17.25, 1…
$ Sex     <chr> "m", "m", "m", "f", "m", "f", "m", "f", "m", "f", "m", "f", "…

relationship between age and price

ggplot(data = horses, aes(x = Age, y =  Price)) 

relationship between age and price

ggplot(data = horses, aes(x = Age, y =  Price)) +
  geom_point()  

relationship between age and price

ggplot(data = horses, aes(x = Age, y = Price)) +
  geom_point() +
  geom_smooth(method = "lm", se= F) 

relationship between age and price

ggplot(data = horses, aes(x = Age, y = Price)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = F ) 

relationship between age and price

ggplot(data = horses, aes(x = Age, y = Price, colour = Sex)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = F ) 

explanatory plot

ggplot(data = horses, aes(x = Age, y = Price, colour = Sex)) +
  geom_point() +
  geom_smooth(method = "lm", se = F, formula = y ~ x + I(x^2), fullrange = T ) +
  ggthemes::theme_clean() +
  scale_y_continuous(labels = scales::dollar_format(prefix = "$")) +
  ggthemes::scale_color_colorblind(labels = c("Female", "Male") ) +
  labs(y = "Price (in US Dollar)", 
       x = "Age (in years)") +
  theme(legend.position = "bottom",
        legend.justification = "right",
        axis.title = element_text(hjust = 0))

explanatory plot

why data visualisation?

why data visualisation?

“[data visualization] forces us to notice what we never expected to see.” (Tukey 1977)

  • communication of findings
  • persuasion of audience
  • selecting appropriate stats
  • exploring structures in the data (e.g. relationship between two variables)
  • understanding patterns (beyond descriptives)

Anscombe’s quartet

Anscombe (1973) and Tufte (1989)

pivot_longer(data = datasets::anscombe, cols = everything(),
             names_to = c(".value", "Set"), names_pattern = "(.)(.)") %>%
  head(5)
# A tibble: 5 x 3
  Set       x     y
  <chr> <dbl> <dbl>
1 1        10  8.04
2 2        10  9.14
3 3        10  7.46
4 4         8  6.58
5 1         8  6.95

Anscombe’s quartet

x
y
Data set Mean SD Mean SD
1 9 3.32 7.5 2.03
2 9 3.32 7.5 2.03
3 9 3.32 7.5 2.03
4 9 3.32 7.5 2.03

Anscombe’s quartet

Anscombe’s quartet

the datasaurus dozen

Matejka and Fitzmaurice (2017): see link

principles of data visualisation

basic principles

  • no “one fits all” method
  • some methods are more informative than others
  • maximise what we can learn from data

basic principles

  • going beyond summary statistics
  • descriptive summary statistics may conceal / obscure important patterns
  • prevent wrong conclusions about data / theory
  • visualisation helps us to understand patterns, structures, relationships
  • see e.g. Anscombe’s Quartet

basic principles

Hartwig and Dearing (1979)

  • skepticism: any visualization might obscure or misrepresent data
  • openness: there might be patterns and structures that we were not expecting

basic principles

Tufte (1983)

  • above all else show the data
  • avoid distorting what the data have to say
  • present many numbers in a small space
  • encourage the eye to compare different pieces of data
  • reveal data at several levels of detail, from broad overview to fine structures

exercise 1

  • data set: mammals (Allison and Cicchetti 1976; Weisberg 1985)
  • average brain (in g) and body weights (in kg) for 62 species of land mammals.
mammals <- read_csv("../data/mammals.csv") %>%
  rename(species = X1)

glimpse(mammals)
Rows: 62
Columns: 3
$ species <chr> "Arctic fox", "Owl monkey", "Mountain beaver", "Cow", "Grey w…
$ body    <dbl> 3.385, 0.480, 1.350, 465.000, 36.330, 27.660, 14.830, 1.040, …
$ brain   <dbl> 44.50, 15.50, 8.10, 423.00, 119.50, 115.00, 98.20, 5.50, 58.0…

exercise 1

creating (gg)plots in R

  • open script exercises/Exercise 1.R
  • read and follow the instructions in the comments
  • fill in the _____s
  • run your code (not the entire script): CTRL+Enter

grammar of graphics

grammar of graphics

  • ggplot2 builds on the grammar of graphics (Wickham 2016, 2010)
  • higher-level plotting system compared to base R functions (e.g. plot(), hist())
  • complex visualisations can be creased with a minimal amount of code
  • integration of statistical information

grammar of graphics

Wilkinson (1999)

  • graphics are build on an underlying grammar
  • system of rules for mapping variables to graphical properties
  • i.e. ingredients (1) and the recipe (2)
  • principle 1: graphics consist of distinct layers of grammatical elements (data, aesthetics, geometries)
  • principle 2: … are build around aesthetic mappings

grammatical elements

  • data: name of the data variable
  • aesthetics: mapping between data and graphic properties (axes, size, colour) indicated as aes()
  • geometries: visual elements encoding the data indicated as geom_…()

ggplot(data = horses, mapping = aes(y = Price, x = Age, colour = Sex))

ggplot(data = horses, mapping = aes(y = Price, x = Age, colour = Sex)) +
  geom_point()

ggplot(data = horses, mapping = aes(y = Price, x = Age, colour = Sex)) +
  geom_smooth(method = "lm") 

ggplot(data = horses, mapping = aes(y = Price, x = Age, colour = Sex)) +
  geom_point() +
  geom_smooth(method = "lm") 

optional grammatical elements

  • facets: dividing data into subplots
  • statistics: summarising representations
  • coordinates: plotting space
  • theme: visual properties not related to the data (font, background)

ggplot2: layers of grammatical elements

  • data
  • aesthetics
ggplot(data = horses, aes(y = Price, x = Age))

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .5)

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .5) +
  facet_grid( ~ Sex)

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .5) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) 

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .5) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  coord_fixed(ratio = 2/25000)

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .5) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  coord_trans(x = "log", y = "reverse")

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .5) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  coord_flip()

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
  • theme
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .25) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  theme_dark() 

ggplot2: layers of grammatical elements

  • data
  • aesthetics
  • geometries
  • facets
  • statistics
  • coordinates
  • theme
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point(size = .25) +
  facet_grid( ~ Sex) +
  stat_smooth(method = "lm", se = FALSE, fullrange = TRUE) +
  theme(panel.background = element_blank()) 

exercise 2

based on Davis (1990) and Fox and Weisberg (2011)

weight <- read_csv("../data/weight.csv") 
glimpse(weight)
Rows: 6,067
Columns: 8
$ subjectid         <dbl> 10027, 10032, 10033, 10092, 10093, 10115, 10117, 10…
$ gender            <chr> "Male", "Male", "Male", "Male", "Male", "Male", "Ma…
$ height            <dbl> 177.6, 170.2, 173.5, 165.5, 191.4, 172.0, 181.0, 18…
$ height_selfreport <dbl> 180.34, 172.72, 172.72, 167.64, 195.58, 175.26, 182…
$ weight            <dbl> 81.5, 72.6, 92.9, 79.4, 94.6, 80.2, 116.2, 95.4, 99…
$ weight_selfreport <dbl> 81.66969, 72.59528, 93.01270, 79.40109, 96.64247, 7…
$ age               <dbl> 41, 35, 42, 31, 21, 39, 32, 23, 36, 23, 32, 28, 36,…
$ race              <dbl> 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, …

exercise 2

grammatical elements in action

  • open script exercises/Exercise 2a.R
  • read and follow the instructions
  • fill in the ____s
  • run code (not the entire script): CTRL+Enter
  • bonus: exercises/Exercise 2b.R

aesthetics and attributes

  • e.g. colour, fill, size, shape, alpha
  • attributes take properties
  • aesthetics take variables
ggplot(horses, aes(y = Price, x = Age)) +
  geom_point(colour = "red")

aesthetics and attributes

  • e.g. colour, fill, size, shape, alpha
  • attributes take properties
  • aesthetics take variables
ggplot(horses, aes(y = Price, x = Age)) +
  geom_point(aes(colour = "red"))

aesthetics and attributes

  • e.g. colour, fill, size, shape, alpha
  • attributes take properties
  • aesthetics take variables
ggplot(horses, aes(y = Price, x = Age)) +
  geom_point(aes(colour = Sex))

aesthetics and attributes

  • e.g. colour, fill, size, shape, alpha
  • attributes take properties
  • aesthetics take variables
ggplot(horses, aes(y = Price, x = Age)) +
  geom_point(aes(colour = Sex)) +
  geom_smooth(method = "lm")

aesthetics and attributes

  • e.g. colour, fill, size, shape, alpha
  • attributes take properties
  • aesthetics take variables
ggplot(horses, aes(y = Price, x = Age)) +
  geom_point() +
  geom_smooth(aes(colour = Sex), method = "lm")

aesthetics and attributes

  • e.g. colour, fill, size, shape, alpha
  • attributes take properties
  • aesthetics take variables
ggplot(horses, aes(y = Price, x = Age)) +
  geom_point(aes(colour = Sex)) +
  geom_smooth(aes(colour = Sex), method = "lm")

aesthetics and attributes

  • e.g. colour, fill, size, shape, alpha
  • attributes take properties
  • aesthetics take variables
ggplot(horses, aes(y = Price, x = Age, colour = Sex)) +
  geom_point() +
  geom_smooth(method = "lm")

aesthetics and attributes

ggplot(horses, aes(y = Price, x = Age, colour = Sex)) +
  geom_point(size = 2) 

aesthetics and attributes

ggplot(horses, aes(y = Price, x = Age, shape = Sex)) +
  geom_point(size = 2)

aesthetics and attributes

ggplot(horses, aes(y = Price, x = Age, colour = Sex, shape = Sex)) +
  geom_point(size = 2)

aesthetics

typically x, y, colour, fill, size, alpha, linetype, labels

  • some are required by geometries; others are optional
  • continuous vs discrete variables:
    • e.g. shape and label can only be used for categorical values
  • use to facilitate comprehension

  • scatterplot: geom_point()
x, y, shape, colour, size, fill, alpha, stroke, group
  • barplot: geom_bar()
x, y, colour, fill, size, linetype, alpha, group
  • boxplot: geom_boxplot()
x, y, lower, xlower, upper, xupper, middle, xmiddle, ymin, xmin, ymax, xmax, 
weight, colour, fill, size, alpha, shape, linetype, group

decoding of continuous variables

(Wong 2010, 665)

  • position on a common scale
  • position on the same but nonaligned scales
  • lengths
  • angles, slopes
  • areas
  • volume, monochromatic colour spectrum (saturation, grey scale)
  • pure spectrum colours

decoding of continuous variables

position on common scale

ggplot(data = horses, aes(x = Age, y = Price)) +
  geom_point(size = 3) +
  facet_grid(~Sex)

decoding of continuous variables

position on non aligned scale

ggplot(data = horses, aes(x = Age, y = Price)) +
  geom_point(size = 3) +
  facet_wrap(~Sex, scales = "free_y")

decoding of continuous variables

colour spectrum

ggplot(data = horses, aes(x = Age, y = Sex, colour = Price)) +
  geom_point(size = 3) 

decoding of continuous variables

area (size)

ggplot(data = horses, aes(x = Age, y = Sex, size = Price)) +
  geom_point() 

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price, 
                          colour = Sex)) +
  geom_point(size = 3) 

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price, 
                          label = Sex)) +
  geom_text(size = 3)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price,
                          shape = Sex)) +
  geom_point(size = 3)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price, 
                          colour = Sex)) +
  geom_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price, 
                          linetype = Sex)) +
  geom_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

decoding of categorical variables (groups)

ggplot(data = horses, aes(x = Age, y = Price, 
                          size = Sex)) +
  geom_smooth(method = "lm", se = F)

  • qualitative colours, labels, line colours
  • sequential colours, shape outlines, line type
  • filled shapes, hatching (shading with lines), line width

exercise 3

practice aesthetics and attributes

  • open script exercises/Exercise 3a.R
  • read and follow the instructions
  • fill in the _____s
  • continue with exercises/Exercise 3b.R
  • and exercises/Exercise 3c.R

major visualisation tools

major visualisation tools

  • visual encoding of aesthetics layer
  • ~50 geometries: geom_…
 [1] "abline"         "area"           "bar"            "bin2d"         
 [5] "blank"          "boxplot"        "col"            "column"        
 [9] "contour"        "contour_filled" "count"          "crossbar"      
[13] "curve"          "density"        "density_2d"     "density2d"     
[17] "dotplot"        "errorbar"       "errorbarh"      "freqpoly"      
[21] "hex"            "histogram"      "hline"          "jitter"        
[25] "label"          "line"           "linerange"      "map"           
[29] "path"           "point"          "pointrange"     "polygon"       
[33] "qq"             "qq_line"        "quantile"       "raster"        
[37] "rect"           "ribbon"         "rug"            "segment"       
[41] "sf"             "sf_label"       "sf_text"        "smooth"        
[45] "spoke"          "step"           "text"           "tile"          
[49] "violin"         "vline"         

major visualisation tools

  • other packages such as tidybayes and ggridges
  • many can be combined
  • depends on visualisation goal
  • and your subject domain
  • three important groups:
    • bivariate distributions
    • univariate distributions
    • group comparisons

major visualisation tools

bivariate distribution

  • function: relationship between two variables
  • variable type: typically continuous
  • examples: scatter plot, time series

major visualisation tools

scatter plot

major visualisation tools

scatter plot

major visualisation tools

time series

major visualisation tools

time series

major visualisation tools

univariate distribution

  • function: distribution of values
  • variable type: continuous or discrete
  • examples: histograms, density plots, bar plots
ggplot(data = horses, aes(x = Sex)) +
  geom_bar()

major visualisation tools

univariate distribution

  • function: distribution of values
  • variable type: continuous or discrete
  • examples: histograms, density plots, bar plots
ggplot(data = horses, aes(x = Price)) +
  geom_histogram()

major visualisation tools

univariate distribution

  • function: distribution of values
  • variable type: continuous or discrete
  • examples: histograms, density plots, bar plots
ggplot(data = horses, aes(x = Price)) +
  geom_density()

major visualisation tools

group comparisons

  • function: distribution of values for two or more groups (often closely tied to statistical descriptions)
  • variable type: continuous
  • examples: points / jitter, box plot, violin plot, barplot (pie chart), dynamite plots

major visualisation tools

dynamite plot and pitfalls thereof

  • suggest normal distribution?
  • same number of observations in each group?
  • bars suggest data where there are none?
  • are there no values above the errorbar?

major visualisation tools

dynamite plots

major visualisation tools

points

major visualisation tools

jittered points

major visualisation tools

jittered points and errorbars

major visualisation tools

box-and-whiskers plot

major visualisation tools

box-and-whiskers plot

major visualisation tools

box-and-whiskers plot (Tukey 1977)

exercise 4

major visualisation tools

  • open script exercises/Exercise 4a.R
  • read and follow the instructions
  • fill in the _____s
  • continue with exercises/Exercise 4b.R

cosmetics

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs()

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(title = "My scatter plot")

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(title = "My scatter plot", 
       subtitle = "I'm a subtitle")

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(caption = "Caption for data source")

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(tag = "A")

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(x = "Age of horse", 
       y = "Price of horse in $")

changing text: labs

  • title
  • subtitle
  • caption
  • tag
  • x
  • y
  • colour, shape etc
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() + 
  labs(colour = "Legend\ntitle:")

changing text: legend keys

  • scale_colour_discrete
  • scale_colour_continuous
  • scale_colour_manual
  • or any other aesthetic instead of colour
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() +  
  scale_colour_discrete(labels = c("female", "male")) 

changing text: legend keys

  • change colour values manually
  • colour names: link
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() +  
  scale_colour_manual(labels = c("female", "male"),
        values = c("darkseagreen", "firebrick"))

changing text: legend keys

  • change colour values manually
  • colour names: link
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() +  
  scale_colour_manual(labels = c("female", "male"),
        values = c("darkseagreen3", "firebrick1"))

changing text: legend keys

  • change colour values manually
  • colour names: link
  mycolours = c("#000000", "#E69F00", "#56B4E9",
                "#009E73", "#F0E442", "#0072B2", 
                "#D55E00", "#CC79A7")

changing text: legend keys

  • change colour values manually
  • colour names: link
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() +  
  scale_colour_manual(labels = c("female", "male"),
                      values = mycolours[c(1,2)])

changing text: legend keys

  • change colour values manually
  • colour names: link
  • ggthemes
ggplot(data = horses, aes(y = Price, x = Age,
                          colour = Sex)) +
  geom_point() +  
  scale_colour_colorblind(labels = c("female", "male"))

changing text: strips

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex)

changing text: strips

horses$Sex <- recode(horses$Sex, f = "female", m = "male")

changing text: strips

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex)

changing text: strips

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both)

themes

  • specify appearance of non-data related ink
[1] "theme_bw"       "theme_classic"  "theme_dark"     "theme_grey"    
[5] "theme_light"    "theme_linedraw" "theme_minimal"  "theme_void"    
  • e.g. ggthemes for more
  • set default: theme_set(theme_minimal())
  • adjust base font: e.g. base_size = 14

themes

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + facet_grid(~Sex) +
  theme_grey(base_size = 14)

themes

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + facet_grid(~Sex) +
  theme_minimal(base_size = 14)

themes

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + facet_grid(~Sex) +
  theme_light(base_size = 14)

themes

  • axis
  • legend
  • panel
  • plot
  • strip
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme()

themes: axis

  • axis.text
    • axis.text.x
    • axis.text.y
  • axis.title
    • axis.title.x
    • axis.title.y
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(axis.text = element_text(face = "bold"))

themes: axis

  • axis.text
    • axis.text.x
    • axis.text.y
  • axis.title
    • axis.title.x
    • axis.title.y
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(axis.title = element_text(face = "bold"))

themes: axis

  • axis.text
    • axis.text.x
    • axis.text.y
  • axis.title
    • axis.title.x
    • axis.title.y
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(axis.title.y = element_text(face = "bold"))

themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box
ggplot(data = horses, aes(y = Price, x = Age, 
                          colour = Sex)) +
  geom_point() + 
  theme()

themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box
ggplot(data = horses, aes(y = Price, x = Age, 
                          colour = Sex)) +
  geom_point() + 
  theme(legend.position = "top")

themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box
ggplot(data = horses, aes(y = Price, x = Age, 
                          colour = Sex)) +
  geom_point() + 
  theme(legend.position = "top", 
        legend.justification = "right")

themes: legend

  • legend.background
  • legend.margin
  • legend.spacing
  • legend.key
  • legend.text
  • legend.title
  • legend.position
  • legend.orientation
  • legend.justification
  • legend.box
ggplot(data = horses, aes(y = Price, x = Age, 
                          colour = Sex)) +
  geom_point() + 
  theme(legend.position = c(.9,.8))

themes: panel

  • panel.background
  • panel.border
  • panel.spacing
  • panel.grid
    • panel.grid.major
    • panel.grid.minor
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme()

themes: panel

  • panel.background
  • panel.border
  • panel.spacing
  • panel.grid
    • panel.grid.major
    • panel.grid.minor
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(panel.background = element_blank())

themes: plot

  • plot.background
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag
  • plot.margin
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() +
  theme()

themes: plot

  • plot.background
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag
  • plot.margin
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(plot.background = element_rect(fill = "pink"))

themes: plot

  • plot.background
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag
  • plot.margin
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  labs(title = "I'm a title") +
  theme(plot.title = element_text(colour = "pink"))

themes: plot

  • plot.background
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag
  • plot.margin
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  labs(caption = "I'm a caption") +
  theme(plot.caption = element_text(face = "italic"))

themes: plot

  • plot.background
  • plot.title
  • plot.subtitle
  • plot.caption
  • plot.tag
  • plot.margin
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  theme(plot.margin = unit(c(2,2,2,2), "cm"))

themes: facet strips

  • strip.background
  • strip.placement
  • strip.text
ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both) +
  theme()

themes: strip.background

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both) +
  theme(strip.background = element_blank())

themes: strip.background

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both) +
  theme(strip.background = element_rect(fill = "forestgreen"))

themes: strip.text

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both) +
  theme(strip.background = element_rect(fill = "forestgreen"),
        strip.text = element_text(colour = "white", hjust = 0))

themes: strip.text

ggplot(data = horses, aes(y = Price, x = Age)) +
  geom_point() + 
  facet_grid(~Sex, labeller = label_both) +
  theme(strip.background = element_rect(fill = "forestgreen"),
        strip.text = element_text(colour = "white", hjust = 0, 
                                  face = "bold", size = 16, angle = 180))

saving your plot

  • ggsave(“name of plot.png”, width = 5, height = 5 )
  • types: pdf, png, tiff, jpg
  • sizes requires some manual adjustment
  • keep the aspect ratio sensible
  • or export function in plots panel

exercise 5

bringing everything together

  • Up-to-date COVID-19 data
  • open script exercises/Exercise 5a.R
  • read and follow the instructions
  • fill in the _____s
  • continue with exercises/Exercise 5b.R

closing remarks

useful resources

references

Allison, Truett, and Domenic V. Cicchetti. 1976. “Sleep in Mammals: Ecological and Constitutional Correlates.” Science 194 (4266). American Association for the Advancement of Science: 732–34.

Anscombe, Francis J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27. Taylor & Francis Group: 17–21.

Davis, Caroline. 1990. “Body Image and Weight Preoccupation: A Comparison Between Exercising and Non-Exercising Women.” Appetite 15 (1). Elsevier: 13–21.

Fox, John, and Sanford Weisberg. 2011. An R Companion to Applied Regression. Vol. 2. Sage.

Hartwig, Frederick, and Brian E. Dearing. 1979. Exploratory Data Analysis. 16. Sage.

Matejka, Justin, and George Fitzmaurice. 2017. “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics Through Simulated Annealing.” In Proceedings of the 2017 Chi Conference on Human Factors in Computing Systems, 1290–4.

Tufte, Edward R. 1983. The Visual Display of Information. Cheshire, Ct: Graphics Press.

———. 1989. The Visual Display of Quantitative Information. Vols. 13 – 14. Graphic Press.

Tukey, John W. 1977. Exploratory Data Analysis. Vol. 2.

Weisberg, S. 1985. Applied Linear Regression. Vol. 2. New York: John Wiley.

Wickham, Hadley. 2010. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics 19 (1). Taylor & Francis: 3–28.

———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer.

Wilkinson, Leland. 1999. The Grammar of Graphics. Springer.

Wong, Bang. 2010. “Points of View: Design of Data Figures.” Nature Methods 7 (9). Nature Publishing Group: 665.